Agglomerative Fuzzy K-Means Clustering Algorithm
نویسندگان
چکیده
Introduction CLUSTERING is a process of grouping a set of objects into clusters so that the objects in the same cluster have high similarity but are very dissimilar with objects in other clusters. The K-Means algorithm is well known for its efficiency in clustering large data sets. Fuzzy versions of the K-Means algorithm have been reported by Ruspini and Bezdek, where each pattern is allowed to have memberships in all clusters rather than having a distinct membership to one single cluster. Numerous problems in real world applications, such as pattern recognition and computer vision, can be tackled effectively by the fuzzy K-Means algorithms, see, for instance. There are two major issues in the application of K-Means-type (nonfuzzy or fuzzy) algorithms in cluster analysis. The first issue is that the number of clusters k needs to be determined in advance as an input to these algorithms. In a real data set, k is usually unknown. In practice, different values of k are tried, and cluster validation techniques are used to measure the clustering results and determine the best value of k. The second issue is that the K-Means-type algorithms use alternating minimization methods to solve nonconvex optimization problems in finding cluster solutions. These algorithms require a set of initial cluster centres to start and often end up with different clustering results from different sets of initial cluster centres. Therefore, the K-Means-type algorithms are very sensitive to the initial cluster centres. Usually, these algorithms are run with different initial guesses of cluster centres, and the results are compared in order to determine the best clustering results. One way is to select the clustering results with the least objective function value formulated in the K-Meanstype algorithms, see, for instance. In addition, cluster validation techniques can be employed to select the best clustering result, see, for instance. Other approaches have been proposed and studied to address this issue by using a better initial seed value selection for K-Means algorithm using genetic algorithm. Recently, Arthur and Vassilvitskii proposed and studied a careful seeding for initial cluster centres to improve clustering results. In this paper, we propose an agglomerative fuzzy K-Means clustering algorithm for numerical data to tackle the above two issues in application of the K-Means-type clustering algorithms. The new algorithm is an extension to the standard fuzzy K-Means algorithm by introducing a penalty term to the objective function to make the clustering process not sensitive to the initial cluster centres. The new algorithm can produce more consistent clustering results from different sets of initial clusters centres. Combined with cluster validation techniques, the new algorithm can determine the number of clusters in a data set. Experimental Results have demonstrated the effectiveness of the new algorithm in producing consistent clustering results and determining the correct number of clusters in different data sets, some with overlapping inherent clusters.
منابع مشابه
Modification of the Fast Global K-means Using a Fuzzy Relation with Application in Microarray Data Analysis
Recognizing genes with distinctive expression levels can help in prevention, diagnosis and treatment of the diseases at the genomic level. In this paper, fast Global k-means (fast GKM) is developed for clustering the gene expression datasets. Fast GKM is a significant improvement of the k-means clustering method. It is an incremental clustering method which starts with one cluster. Iteratively ...
متن کاملSurvey of Clustering Applications
Data mining is the process of collecting and analyzing useful patterns from huge amount of data, it has five major functions, clustering is one of them. In clustering, we make clusters of same data. The items in one group of cluster are alike while different from items which are in some other group of cluster. In image segmentation clustering approach is used for making segments of images and i...
متن کاملAgglomerative Mean Shift Cluster Using Shortest Path and Fuzzification Algorithm
In this research paper, an agglomerative mean shift with fuzzy clustering algorithm for numerical data and image data, an extension to the standard fuzzy C-Means algorithm by introducing a penalty term to the objective function to make the clustering process not sensitive to the initial cluster centers. The new algorithm of Shortest path and Fuzzification algorithm can produce more consistent c...
متن کاملOPTIMIZATION OF FUZZY CLUSTERING CRITERIA BY A HYBRID PSO AND FUZZY C-MEANS CLUSTERING ALGORITHM
This paper presents an efficient hybrid method, namely fuzzy particleswarm optimization (FPSO) and fuzzy c-means (FCM) algorithms, to solve the fuzzyclustering problem, especially for large sizes. When the problem becomes large, theFCM algorithm may result in uneven distribution of data, making it difficult to findan optimal solution in reasonable amount of time. The PSO algorithm does find ago...
متن کاملA Fuzzy C-means Algorithm for Clustering Fuzzy Data and Its Application in Clustering Incomplete Data
The fuzzy c-means clustering algorithm is a useful tool for clustering; but it is convenient only for crisp complete data. In this article, an enhancement of the algorithm is proposed which is suitable for clustering trapezoidal fuzzy data. A linear ranking function is used to define a distance for trapezoidal fuzzy data. Then, as an application, a method based on the proposed algorithm is pres...
متن کامل